现代处理器使用一种称为 虚拟寻址。 虚拟内存(VM) 是对主存的一种抽象,为每个进程提供一个私有的、连续的 线性地址空间。
1. 寻址方式的演进
在 物理寻址 (图 9.1)中,CPU 直接向 DRAM 发送一个 物理地址(PA) 。在 虚拟寻址 (图 9.2)中,处理器生成一个 虚拟地址(VA),该地址由 内存管理单元(MMU) 转换为物理地址(PA),然后才访问内存。
2. 层次结构与缓存
DRAM 作为磁盘存储的 DRAM 缓存 。由于磁盘延迟较高,系统采用 写回策略。地址翻译通过 TLB ,利用 TLB 索引(TLBI) 并由如 PROT_WRITE等保护位来保障安全。大地址空间($N = 2^n$)支持复杂的 段 以及 操作系统对输入/输出设备的服务。
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
[Fill in the blanks] Complete the following table for a Virtual Address Space where $N = 2^n$ and the largest address is $N-1$. Use units: $K=2^{10}, M=2^{20}, G=2^{30}, T=2^{40}, P=2^{50}, E=2^{60}$.
n=16: N=64K, Max=0xFFFF; n=32: N=4G, Max=0xFFFFFFFF; n=48: N=256T, Max=0xFFFFFFFFFFFF; n=64: N=16E, Max=0xFFFFFFFFFFFFFFFF
n=16: N=16K; n=32: N=32M; n=48: N=48G; n=64: N=64T
✅ Correct!
Correct. Virtual address space growth is exponential. $2^{16}=64K$, $2^{32}=4G$, $2^{48}=256T$, and $2^{64}=16E$.❌ Incorrect
Recall that $N = 2^n$. $2^{10}$ is 1K, $2^{20}$ is 1M, $2^{30}$ is 1G, etc.QUESTION 2
Why does the DRAM cache use a Write-back policy instead of Write-through?
To maintain Stack Discipline.
Due to the massive latency penalty of accessing disk storage.
To prevent internal fragmentation.
Because the L3 unified cache is 16-way associative.
✅ Correct!
Correct. Synchronizing with the disk on every write would be too slow, so updates are deferred until swapping occurs.❌ Incorrect
Disk access is roughly 100,000 times slower than DRAM; writing back only when necessary is vital.QUESTION 3
What happens if a process attempts to write to a page where the PTE has PROT_WRITE set to 0?
A General protection fault occurs.
The MMU performs Swapping automatically.
Deferred coalescing is triggered.
The Reference bit (A bit) is cleared.
✅ Correct!
Correct. The MMU hardware checks protection bits during address translation; violations trigger an exception (segfault/GPF).❌ Incorrect
Protection bits are enforced by hardware to ensure process isolation in the virtual address space.QUESTION 4
Which component is responsible for translating the Virtual Address to a Physical Address?
The L3 unified cache (8 MB, 16-way).
The Memory Management Unit (MMU).
The Rio Package.
The Buddy System Allocator.
✅ Correct!
Correct. The MMU is the hardware that performs the mapping using page tables and the TLB.❌ Incorrect
The MMU sits between the CPU and main memory specifically for this translation.QUESTION 5
What is the physical address (PA) if the MMU finds a mapping for a VA where VPN=0x0d and VPO=0x14, given PPN=0x0d?
PA = 0x354: PPN=0x0d, PPO=0x14
$PA = 0x140d$
$PA = 0x0001$
PA = 0x354: VPN=0x0d, VPO=0x14
✅ Correct!
Correct. PPO is identical to VPO. PA is the concatenation of PPN and PPO.❌ Incorrect
During translation, the offset (VPO) stays the same (becomes PPO), while the VPN is mapped to a PPN.Case Study: Advanced Heap Management & VM Translation
Optimizing Allocators and Address Translation Efficiency
You are designing a high-performance allocator for an Intel Core i7 system with a k-level page table hierarchy. The allocator uses a single global variable (heap_listp) and must minimize external fragmentation using Buddy System Allocation. You also need to modify the block structure to improve Peak memory utilization.
Q
1. Implement the 'find_fit' function for a simple implicit list allocator using first-fit logic. Your solution must account for Internal fragmentation and return NULL if no block is found.
Solution:
Implementation: static void *find_fit(size_t asize) { void *bp; /* Iterate through the heap starting from heap_listp */ for (bp = heap_listp; GET_SIZE(HDRP(bp)) > 0; bp = NEXT_BLKP(bp)) { /* If block is not allocated and size is sufficient */ if (!GET_ALLOC(HDRP(bp)) && (asize <= GET_SIZE(HDRP(bp)))) { return bp; } } return NULL; /* No fit found */ } This function searches the linear heap space, balancing search speed with memory utilization.
Implementation: static void *find_fit(size_t asize) { void *bp; /* Iterate through the heap starting from heap_listp */ for (bp = heap_listp; GET_SIZE(HDRP(bp)) > 0; bp = NEXT_BLKP(bp)) { /* If block is not allocated and size is sufficient */ if (!GET_ALLOC(HDRP(bp)) && (asize <= GET_SIZE(HDRP(bp)))) { return bp; } } return NULL; /* No fit found */ } This function searches the linear heap space, balancing search speed with memory utilization.
Q
2. Explain how to modify an allocator to remove boundary tags (footers) from allocated blocks. How does this affect Peak memory utilization and the mm_free(void *bp) logic?
Solution:
To remove footers in allocated blocks, we store an 'extra' bit in the header of the *next* block to indicate whether the *current* block is allocated. This allows us to reduce the minimum block size by one word, directly increasing Peak memory utilization by reducing internal fragmentation in small objects. However, mm_free and coalesce(bp) must now inspect the header of the next block to determine if the previous block can be merged, maintaining constant-time coalescing for free blocks (which still require footers).
To remove footers in allocated blocks, we store an 'extra' bit in the header of the *next* block to indicate whether the *current* block is allocated. This allows us to reduce the minimum block size by one word, directly increasing Peak memory utilization by reducing internal fragmentation in small objects. However, mm_free and coalesce(bp) must now inspect the header of the next block to determine if the previous block can be merged, maintaining constant-time coalescing for free blocks (which still require footers).
Q
3. Describe the 'Buddy System Allocation' process for a block of size 32 bytes at address xxx...x10000. How does it handle recursive splitting?
Solution:
The Buddy System finds a block of size $2^j$ where $k \le j \le m$. To allocate 32 bytes ($2^5$), if only a $2^7$ block is available, it splits into two $2^6$ buddies. One $2^6$ block is then split again into two $2^5$ buddies. One buddy at address xxx...x10000 is allocated, while its buddy (address xxx...x11000) remains free. This minimizes external fragmentation and allows for fast buddy-merging using bitwise address operations.
The Buddy System finds a block of size $2^j$ where $k \le j \le m$. To allocate 32 bytes ($2^5$), if only a $2^7$ block is available, it splits into two $2^6$ buddies. One $2^6$ block is then split again into two $2^5$ buddies. One buddy at address xxx...x10000 is allocated, while its buddy (address xxx...x11000) remains free. This minimizes external fragmentation and allows for fast buddy-merging using bitwise address operations.